Activation
对输入的数组中每一个元素执行激活函数计算,激活函数可选,具体函数见以下说明。
Relu- 标准Relu函数。\[output_i = \max(0, input_i)\]Relu6- 在标准Relu函数的基础上进行输出上限限制。\[output_i = \min(\max(0, input_i),6)\]Clip- 将输入裁剪到区间 [min_val, max_val]\[output_i = \min(\max(input_i, \text{min_val}), \text{max_val})\]LRelu- 带泄露的线性整流单元(Leaky Rectified Linear Unit),它在输入为正时保持线性,在输入为负时也保留一个很小的斜率,以避免标准 ReLU 中的“死亡神经元”问题。\[\begin{split}output_i = \begin{cases} input_i, & input_i \ge 0 \\ \alpha \cdot input_i, & input_i < 0 \end{cases}\end{split}\]Sigmoid- 常用的平滑非线性激活函数(又称逻辑函数),可以将任意实数映射到区间 \((0, 1)\),常用于二分类问题的输出层,表示概率意义的结果。\[output_i = \frac{1}{1 + e^{-input_i}}\]Tanh- 双曲正切激活函数(Hyperbolic Tangent),其输出范围为 \((-1, 1)\) 。\[output_i = \tanh(input_i) = \frac{e^{input_i} - e^{-input_i}}{e^{input_i} + e^{-input_i}}\]HSigmoid- 硬 Sigmoid 激活函数(Hard Sigmoid),是Sigmoid函数的近似形式,计算简单、效率更高。\[output_i = \text{clip}\left(\frac{input_i + 3}{6}, 0, 1\right)\]其中
clip(a, 0, 1)表示将a限制在区间 \([0, 1]\) 内。Swish- 自门控(Self-Gated)激活函数,由 Google 提出,结合了Sigmoid与线性特性,具有平滑且非单调的特点。\[output_i = input_i \cdot \sigma(input_i) = \frac{input_i}{1 + e^{-input_i}}\]其中 \(\sigma(x)\) 为标准
Sigmoid函数。Swish在深层网络中通常表现优于ReLU。HSwish- 硬 Swish 激活函数(Hard Swish),是Swish函数的近似形式,计算简单且在移动端模型(如 MobileNetV3)中被广泛采用。\[output_i = input_i \cdot \text{clip}\left(\frac{input_i + 3}{6}, 0, 1\right)\]其中
clip(a, 0, 1)表示将a限制在区间 \([0, 1]\) 内。HardTanh- 硬双曲正切激活函数(Hard Tanh),是Tanh函数的分段线性近似形式,计算简单、梯度稳定,常用于量化或轻量网络中。\[output_i = \text{clip}(input_i, min\_val, max\_val)\]其中
clip(x, min_val, max_val)表示当 \(x < min\_val\) 时输出 min_val,当 \(x > max\_val\) 时输出 max_val,否则输出 \(x\) 本身。Gelu- 高斯误差线性单元(Gaussian Error Linear Unit),是一种平滑的非线性激活函数,结合了ReLU与概率特性。 该函数支持精确计算及非近似计算模式,近似算法由 Hendrycks & Gimpel (2016) 提出,用以替代精确形式 \(output_i=x\Phi(x)\) ,计算速度更快且精度损失极小。\[\begin{split}\begin{aligned} output_i = \begin{cases} 0.5\,input_i \Bigl[ 1 + \tanh\!\Bigl( \sqrt{\frac{2}{\pi}}\,(input_i + 0.044715\,input_i^3) \Bigr) \Bigr], & flag = true, \\[6pt] input_i \,\Phi(input_i) = \tfrac{1}{2}x \Bigl[ 1 + \mathrm{erf}\!\Bigl(\tfrac{input_i}{\sqrt{2}}\Bigr) \Bigr], & flag = false. \end{cases} \end{aligned}\end{split}\]其中 \(\Phi(x)\) 为标准正态分布的累积分布函数。
Softplus-ReLU的平滑近似形式,能在零点处保持可导性。\[\begin{split}output_i = \begin{cases} input_i, & input_i \gt 88.0 \\ \ln(1 + e^{input_i}), & \text{otherwise} \end{cases}\end{split}\]Elu- 在输入为正时保持线性,在输入为负时呈指数衰减,可缓解 ReLU 的“死亡神经元”问题。\[\begin{split}output_i = \begin{cases} input_i, & input_i \ge 0 \\ \alpha (e^{input_i} - 1), & input_i < 0 \end{cases}\end{split}\]其中 \(\alpha\) 为超参数,通常取 \(\alpha = 1.0\)。
Celu- 连续指数线性单元(Continuously Differentiable ELU),是ELU的改进版本,保证在零点处连续可导。\[\begin{split}output_i = \begin{cases} input_i, & input_i \ge 0 \\ \alpha (e^{\frac{input_i}{alpha}} - 1), & input_i < 0 \end{cases}\end{split}\]其中 \(\alpha\) 为可调超参数,用于控制负区间的平滑程度。
HardShrink- 硬收缩激活函数(Hard Shrinkage),用于稀疏化输出。\[\begin{split}output_i = \begin{cases} input_i, & \text{if } |input_i| > \lambda \\ 0, & \text{otherwise} \end{cases}\end{split}\]其中 \(\lambda\) 为阈值常数。
SoftShrink- 软收缩激活函数(Soft Shrinkage),与HardShrink类似,但收缩过程更加平滑。\[\begin{split}output_i = \begin{cases} input_i - \lambda, & \text{if } input_i > \lambda \\ input_i + \lambda, & \text{if } input_i < -\lambda \\ 0, & \text{otherwise} \end{cases}\end{split}\]SoftsignOpt- 优化的软符号函数(Optimized Softsign),是一种平滑的压缩函数,用于将输入映射到有限区间。\[output_i = \frac{input_i}{1 + |input_i|}\]
- 输入:
Input0 - 输入数据地址。
length - 数组长度。
args(部分激活函数) - 激活函数计算参数(仅适用于部分函数)。
core_mask(int, 可选) - 核掩码(仅适用于共享存储版本)。
- 输出:
output - 计算结果地址。
- 支持平台:
FT78NEMT7004
备注
FT78NE 支持int8, fp32
MT7004 支持fp16, fp32
共享存储版本:
-
void i8_relu_s(int8_t *Input0, int8_t *output, int length, int core_mask)
-
void fp_relu_s(float *Input0, float *output, int length, int core_mask)
-
void hp_relu_s(half *Input0, half *output, int length, int core_mask)
-
void i8_relu6_s(int8_t *Input0, int8_t *output, int length, int core_mask)
-
void fp_relu6_s(float *Input0, float *output, int length, int core_mask)
-
void hp_relu6_s(half *Input0, half *output, int length, int core_mask)
-
void i8_clip_s(int8_t *Input0, int8_t *output, int length, int8_t min_val, int8_t max_val, int core_mask)
-
void fp_clip_s(float *Input0, float *output, int length, float min_val, float max_val, int core_mask)
-
void hp_clip_s(half *Input0, half *output, int length, half min_val, half max_val, int core_mask)
-
void i8_lrelu_s(int8_t *Input0, int8_t *output, int length, float alpha, int core_mask)
-
void fp_lrelu_s(float *Input0, float *output, int length, float alpha, int core_mask)
-
void hp_lrelu_s(half *Input0, half *output, int length, half alpha, int core_mask)
-
void i8_sigmoid_s(int8_t *Input0, float *output, int length, int core_mask)
-
void fp_sigmoid_s(float *Input0, float *output, int length, int core_mask)
-
void hp_sigmoid_s(half *Input0, half *output, int length, int core_mask)
-
void i8_tanh_s(int8_t *Input0, float *output, int length, int core_mask)
-
void fp_tanh_s(float *Input0, float *output, int length, int core_mask)
-
void hp_tanh_s(half *Input0, half *output, int length, int core_mask)
-
void i8_hsigmoid_s(int8_t *Input0, float *output, int length, int core_mask)
-
void fp_hsigmoid_s(float *Input0, float *output, int length, int core_mask)
-
void hp_hsigmoid_s(half *Input0, half *output, int length, int core_mask)
-
void i8_swish_s(int8_t *Input0, float *output, int length, int core_mask)
-
void fp_swish_s(float *Input0, float *output, int length, int core_mask)
-
void hp_swish_s(half *Input0, half *output, int length, int core_mask)
-
void i8_hswish_s(int8_t *Input0, float *output, int length, int core_mask)
-
void fp_hswish_s(float *Input0, float *output, int length, int core_mask)
-
void hp_hswish_s(half *Input0, half *output, int length, int core_mask)
-
void i8_hardtanh_s(int8_t *Input0, int8_t *output, int length, int8_t min_val, int8_t max_val, int core_mask)
-
void fp_hardtanh_s(float *Input0, float *output, int length, float min_val, float max_val, int core_mask)
-
void hp_hardtanh_s(half *Input0, half *output, int length, half min_val, half max_val, int core_mask)
-
void i8_gelu_s(int8_t *Input0, float *output, int length, int approximate, int core_mask)
-
void fp_gelu_s(float *Input0, float *output, int length, int approximate, int core_mask)
-
void hp_gelu_s(half *Input0, half *output, int length, int approximate, int core_mask)
-
void i8_softplus_s(int8_t *Input0, float *output, int length, int core_mask)
-
void fp_softplus_s(float *Input0, float *output, int length, int core_mask)
-
void hp_softplus_s(half *Input0, half *output, int length, int core_mask)
-
void i8_elu_s(int8_t *Input0, float *output, int length, float alpha, int core_mask)
-
void fp_elu_s(float *Input0, float *output, int length, float alpha, int core_mask)
-
void hp_elu_s(half *Input0, half *output, int length, half alpha, int core_mask)
-
void i8_celu_s(int8_t *Input0, float *output, int length, float alpha, int core_mask)
-
void fp_celu_s(float *Input0, float *output, int length, float alpha, int core_mask)
-
void hp_celu_s(half *Input0, half *output, int length, half alpha, int core_mask)
-
void i8_hardshrink_s(int8_t *Input0, int8_t *output, int length, int8_t lambd, int core_mask)
-
void fp_hardshrink_s(float *Input0, float *output, int length, float lambd, int core_mask)
-
void hp_hardshrink_s(half *Input0, half *output, int length, half lambd, int core_mask)
-
void i8_softshrink_s(int8_t *Input0, int8_t *output, int length, int8_t lambd, int core_mask)
-
void fp_softshrink_s(float *Input0, float *output, int length, float lambd, int core_mask)
-
void hp_softshrink_s(half *Input0, half *output, int length, half lambd, int core_mask)
-
void i8_softsignopt_s(int8_t *Input0, float *output, int length, int core_mask)
-
void fp_softsignopt_s(float *Input0, float *output, int length, int core_mask)
-
void hp_softsignopt_s(half *Input0, half *output, int length, int core_mask)
C调用示例:
1//FT78NE示例 2#include <stdio.h> 3#include <activation.h> 4 5int main(int argc, char* argv[]) { 6 float *input0 = (float *)0xA0000000; //input在DDR空间 7 float *output = (float *)0xC0000000; 8 int length = 1000; 9 int core_mask = 0xff; 10 fp_tanh_s(input0, output, length, core_mask); 11 return 0; 12}
私有存储版本:
-
void i8_relu_p(int8_t *Input0, int8_t *output, int length)
-
void fp_relu_p(float *Input0, float *output, int length)
-
void hp_relu_p(half *Input0, half *output, int length)
-
void i8_relu6_p(int8_t *Input0, int8_t *output, int length)
-
void fp_relu6_p(float *Input0, float *output, int length)
-
void hp_relu6_p(half *Input0, half *output, int length)
-
void i8_clip_p(int8_t *Input0, int8_t *output, int length, int8_t min_val, int8_t max_val)
-
void fp_clip_p(float *Input0, float *output, int length, float min_val, float max_val)
-
void hp_clip_p(half *Input0, half *output, int length, half min_val, half max_val)
-
void i8_lrelu_p(int8_t *Input0, int8_t *output, int length, float alpha)
-
void fp_lrelu_p(float *Input0, float *output, int length, float alpha)
-
void hp_lrelu_p(half *Input0, half *output, int length, half alpha)
-
void i8_sigmoid_p(int8_t *Input0, float *output, int length)
-
void fp_sigmoid_p(float *Input0, float *output, int length)
-
void hp_sigmoid_p(half *Input0, half *output, int length)
-
void i8_tanh_p(int8_t *Input0, float *output, int length)
-
void fp_tanh_p(float *Input0, float *output, int length)
-
void hp_tanh_p(half *Input0, half *output, int length)
-
void i8_hsigmoid_p(int8_t *Input0, float *output, int length)
-
void fp_hsigmoid_p(float *Input0, float *output, int length)
-
void hp_hsigmoid_p(half *Input0, half *output, int length)
-
void i8_swish_p(int8_t *Input0, float *output, int length)
-
void fp_swish_p(float *Input0, float *output, int length)
-
void hp_swish_p(half *Input0, half *output, int length)
-
void i8_hswish_p(int8_t *Input0, float *output, int length)
-
void fp_hswish_p(float *Input0, float *output, int length)
-
void hp_hswish_p(half *Input0, half *output, int length)
-
void i8_hardtanh_p(int8_t *Input0, int8_t *output, int length, int8_t min_val, int8_t max_val)
-
void fp_hardtanh_p(float *Input0, float *output, int length, float min_val, float max_val)
-
void hp_hardtanh_p(half *Input0, half *output, int length, half min_val, half max_val)
-
void i8_gelu_p(int8_t *Input0, float *output, int length, int approximate)
-
void fp_gelu_p(float *Input0, float *output, int length, int approximate)
-
void hp_gelu_p(half *Input0, half *output, int length, int approximate)
-
void i8_softplus_p(int8_t *Input0, float *output, int length)
-
void fp_softplus_p(float *Input0, float *output, int length)
-
void hp_softplus_p(half *Input0, half *output, int length)
-
void i8_elu_p(int8_t *Input0, float *output, int length, float alpha)
-
void fp_elu_p(float *Input0, float *output, int length, float alpha)
-
void hp_elu_p(half *Input0, half *output, int length, half alpha)
-
void i8_celu_p(int8_t *Input0, float *output, int length, float alpha)
-
void fp_celu_p(float *Input0, float *output, int length, float alpha)
-
void hp_celu_p(half *Input0, half *output, int length, half alpha)
-
void i8_hardshrink_p(int8_t *Input0, int8_t *output, int length, int8_t lambd)
-
void fp_hardshrink_p(float *Input0, float *output, int length, float lambd)
-
void hp_hardshrink_p(half *Input0, half *output, int length, half lambd)
-
void i8_softshrink_p(int8_t *Input0, int8_t *output, int length, int8_t lambd)
-
void fp_softshrink_p(float *Input0, float *output, int length, float lambd)
-
void hp_softshrink_p(half *Input0, half *output, int length, half lambd)
-
void i8_softsignopt_p(int8_t *Input0, float *output, int length)
-
void fp_softsignopt_p(float *Input0, float *output, int length)
-
void hp_softsignopt_p(half *Input0, half *output, int length)
C调用示例:
1//FT78NE示例 2#include <stdio.h> 3#include <activation.h> 4 5int main(int argc, char* argv[]) { 6 float *input0 = (float *)0x10000000; //input在DDR空间 7 float *output = (float *)0x10004000; 8 int length = 1000; 9 fp_tanh_p(input0, output, length); 10 return 0; 11}